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SCHUR COMPLEMENT BASED DOMAIN DECOMPOSITION 
PRECONDITIONERS WITH LOW-RANK CORRECTIONS * 

RUIPENG LI , YUANZHE XI , AND YOUSEF SAAD t 


Abstract. This paper introduces a robust preconditioner for general sparse symmetric matrices, 
that is based on low-rank approximations of the Schur complement in a Domain Decomposition (DD) 
framework. In this “Schur Low Rank” (SLR) preconditioning approach, the coefficient matrix is first 
decoupled by DD, and then a low-rank correction is exploited to compute an approximate inverse of 
the Schur complement associated with the interface points. The method avoids explicit formation 
of the Schur complement matrix. We show the feasibility of this strategy for a model problem, and 
conduct a detailed spectral analysis for the relationship between the low-rank correction and the 
quality of the preconditioning. Numerical experiments on general matrices illustrate the robustness 
and efficiency of the proposed approach. 

Key words, low-rank approximation, the Lanczos algorithm, domain decomposition, symmetric 
sparse linear system, parallel preconditioner, Krylov subspace method 


1. Introduction. We consider the problem of solving the linear system 

Ax = 6, (1-1) 

with A S a large sparse symmetric matrix. Krylov subspace methods pre¬ 

conditioned with a form of Incomplete LU (ILU) factorization can be quite effective 
for this problem but there are situations where ILU-type preconditioners encounter 
difficulties. For instance, when the matrix is highly ill-conditioned or indefinite, the 
construction of the factors may not complete or may result in unstable factors. An¬ 
other situation, one that initially motivated this line of work, is that ILU-based meth¬ 
ods can yield exceedingly poor performance on certain high performance computers 
such as those equipped with GPUs [12] or Intel Xeon Phi processors. This is because 
building and using ILU factorizations is a highly sequential process. Blocking, which 
is a highly effective strategy utilized by sparse direct solvers to boost performance, is 
rarely exploited in the realm of iterative solution techniques. 

In the late 1990s, a class of methods appeared as an alternative to ILUs, that did 
not require forward and backward triangular solves. These were developed primarily 
as a means to bypass the issues just mentioned and were based on finding an approx¬ 
imate inverse of the original matrix, that was also sparse, see, e.g., 11151 na among 
others. These methods were, by and large, later abandoned as practitioners found 
them too costly in terms of preprocessing, iteration time, and memory usage. 

Another line of work that emerged in recent years as a means to compute pre¬ 
conditioners, is that of rank-structured matrices. The starting point is the work by 
W. Hackbusch and co-workers who introduced the notion of 'H-matrices in the 1990s 
mdi. These were based on some interesting rank-structure observed on matrices 
arising from the use of the fast multipole methods or the inverses of some partial 
differential operators. A similar rank-structure was also exploited by others in the 
so-called Hierarchically Semi-Separable (HSS) matrix format which represents certain 
off-diagonal blocks by low-rank matrices [TlISlIIllEollMllsolEII. 
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More recent approaches did not exploit this rank structure but focused instead on 
a multilevel low-rank correction technique, which include the recursive Multilevel Low- 
Rank (MLR) preconditioner [^, Domain Decomposition based Low-Rank (DD-LR) 
preconditioner [23j, and the LORASC preconditioner m- This paper generalizes the 
technique developed in |21j to the classical Schur complement methods and proposes 
a Schur complement based Low-Rank (SLR) correction preconditioner. 

This paper considers only symmetric matrices, and the proposed spectral analysis 
is restricted to the Symmetric Positive Definite (SPD) case. However, the method can 
be extended to symmetric indefinite matrices as long as they have an SPD interface, 
i.e., the submatrix associated with the interface unknowns resulting from the parti¬ 
tioning is SPD. This assumption usually holds for matrices arising from discretization 
of PDEs. Extensions to the symmetric indefinite matrices with indefinite interface 
matrices, as well as to nonsymmetric matrices are also possible but these will be 
explored in our future work. 

It is useful to compare the advantages and the disadvantages of the proposed 
approach with those of the traditional ILU-type and the approximate inverse-type 
preconditioners. First, the SLR preconditioner is directly applicable to the class of 
distributed linear systems that arise in any standard Domain Decomposition (DD) 
approach, including all vertex-based, edge-based, or element-based partitionings. Sec¬ 
ond, it is well suited for single-instruction-multiple-data (SIMD) parallel machines. 
Thus, one can expect to implement this preconditioner on a multiprocessor system 
based on a multi(many)-core architecture exploiting two levels of parallelism. Third, 
as indicted by the experimental results, this method is not as sensitive to indefinite¬ 
ness as ILUs or sparse approximate inverse preconditioners. A fourth appeal, shared 
by all the approximate inverse-type methods, is that an SLR preconditioner can be 
easily updated in the sense that if it does not yield satisfactory performance, it can 
easily be improved without forfeiting the work performed so far in building it. 

The paper is organized as follows: in Section we introduce the DD framework 
and Schur complement techniques. A spectral analysis will be proposed Section 
The SLR preconditioner will be discussed in Section followed by implementation 
details in Section Numerical results of model problems and general symmetric 
linear systems are presented in Section and we conclude in Section 

2. Background: sparse linear systems and the DD framework. In m 

we introduced a method based on a divide-and-conquer approach that consisted in 
approximating the inverse of a matrix A by essentially the inverse of its 2 x 2 block- 
diagonal approximation plus a low-rank correction. This principle was then applied 
recursively to each of the diagonal blocks. We observed that there is often a decay 
property when approximating the inverse of a matrix by the inverse of a close-by 
matrix in other contexts. By this we mean that the difference between the two inverses 
has very rapidly decaying eigenvalues, which makes it possible to approximate this 
difference by small-rank matrices. The best framework where this property takes 
place is that of DD which is emphasized in this paper. 


2.1. Graph partitioning. Figure 2.1 shows two standard ways of partitioning 
a graph |26j . On the left side is a vertex-based partitioning that is common in the 
sparse matrix community where it is also referred to as graph partitioning by edge- 
separators. A vertex is an equation-unknown pair and the partitioner subdivides the 
vertex set into p partitions, i.e., p non-overlapping subsets whose union is equal to the 
original vertex set. On the right side is an edge-based partitioning, which, in contrast. 
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consists of assigning edges to subdomains. This is also called graph partitioning by 
vertex separators in the graph theory community. 


Fig. 2.1. Two classical ways of partitioning a graph, vertex-based partitioning (left) and edge- 
based partitioning (right). 




From the perspective of a subdomain, one can distinguish 3 types of unknowns: 
(1) interior unknowns, (2) local interface unknowns, (3) and external interface un¬ 
knowns. This is illustrated on the top of Figure |2.2| In a vertex-based partitioning, 
interior unknowns are those coupled only with local unknowns; local interface un¬ 
knowns are those coupled with both external and local unknowns; and external inter¬ 
face unknowns are those that belong to other subdomains and are coupled with local 
interface unknowns. In an edge-based partitioning the local and external interface 
are merged into one set consisting all nodes that are shared by a subdomain and its 
neighbors while interior nodes are those nodes that are not shared. 


Fig. 2.2. A local view of a distributed sparse matrix: vertex-based partitioning (top-left), edge- 
based partitioning (top-right), and the matrix representation (bottom). 



For both types of partitionings, the rows of the matrix assigned to subdomain i 
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can be split into two parts: a local matrix Ai which acts on the local unknowns and an 
interface matrix which acts on the external interface unknowns (shared unknowns 
for edge-based partitioning). Local unknowns in each subdomain are reordered so 
that the interface unknowns are listed after the interior ones. Thus, each vector of 
local unknowns Xi is split into two parts: a subvector Ui of the internal components 
followed by a subvector yi of the interface components. The right-hand-side vector hi 
is conformingly split into subvectors fi and gi. Partitioning the matrix according to 
this splitting, the local system of equations can be written as 





0 



( 2 . 1 ) 


Here Ni is the set of the indices of the subdomains that are neighboring to i. The term 
Eijyj is a part of the product which reflects the contribution to the local equation 
from the neighboring subdomain j. The result of this multiplication affects only local 
interface equations, which is indicated by the zero in the top part of the second term 
of the left-hand side of (2.1). If we denote by the set of the local interface unknowns 
of subdomain i, then the global interface y is given by 3^ = ljf=i y and g 

be the subvectors of x and h corresponding to y. Note that in the case of the vertex- 
based partitioning, we have = 0, for i ^ j such that • • • , yJl and 


9 ^ = 




>yp 

If we stack all interior unknowns ui,U 2 , ■ ■ ■ ,Up into a vector u 


in this order, and we reorder the equations so that u is listed first followed by y, we 
obtain a global system that has the following form: 
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or a more compact form, 



( 2 . 2 ) 


(2.3) 


An illustration is shown in Figure [Z3| for the vertex-based and the edge-based par¬ 
titionings of 4 subdomains for a 2-D Laplacian matrix. Each of these two partitioning 
methods has its advantages and disadvantages. In the present work, we will focus on 
the edge-based partitioning, but this approach is also applicable to the situation of a 
vertex-based partitioning. 


A popular way of solving a global matrix in the form of (2.3) is to exploit the 


Schur complement techniques that eliminate the interior unknowns Ui first and then 
focus on computing in some way the interface unknowns. A novel approach based on 
this principle is proposed in the next section. 


2.2. Schur complement techniques. To solve the system (2.3) obtained from 


a DD reordering, a number of techniques rely on the following basic block factorization 




B E 
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with S = C — E'^B ^E, 


(2.4) 













Fig. 2.3. An example of a 2-D Laplacian matrix which is partitioned into 4 subdomains with 
edge separators (left) and vertex separators (right), respectively. 



nz = 4992 
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where S' S is the ‘Schur complement’ matrix. If an approximate solve with the 
matrix S is available then one can easily solve the original system by exploiting the 
above factorization. In this case note that this will require two solves with B and 
one solve with S. In classical ILU-type preconditioners, e.g., in a two-level ARMS 
method an approximation to the Schur complement S is formed by dropping 
small terms and then an ILU factorization of S is obtained. In contrast, the SLR 
preconditioner introduced in this paper approximates the inverse of S directly by the 
sum of C~^ and a low-rank correction term, resulting in improved robustness for 
indefinite problems. Details on the low-rank property for S~^ —C~^ will be discussed 
in the next section. 

3. Spectral analysis. In this section we study the fast eigenvalue decay prop¬ 
erty of S~^ — C~^. In other words, our goal is to show that S~^ « -l-LRC, where 
LRC stands for low-rank correction matrix. 

3.1. Decay properties of S~^ — C~^. Assuming that the matrix C in 
SPD and C = is its Cholesky factorization, then we can write 

S = L{I- L-^E^B-^EL-'^) = L{I - H)L^. (3.1) 

Consider now the spectral factorization oi H S M®^® 

H = L-^E^B-^EL-'^ = (3.2) 

where U is unitary, and A = diag (Ai,..., Ag) is the diagonal matrix of eigenvalues. 
When A is SPD, then H is at least Symmetric Positive Semi-Definite (SPSD) and the 
following lemma shows that the eigenvalues Ai’s are all less than one. 

Lemma 3.1. Let H = L~^E"^B~^EL~'^ and assume that A is SPD. Then we 
have 0 < Ai < 1, for each eigenvalue Xi of H, i = 1,..., s. 

Proof. If A is SPD, then B, C and S are all SPD. Since an arbitrary eigenvalue 
X{H) of H satisfies 

X{H) = x{c-^e'^b-^e) = X{C-\C -S)) = l- X{C-^S) < 1, 


2.21 is 
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and H is at least SPSD, we have 0 < Ai < 1. □ 

From (3.11, we know that the inverse of S reads 


S-^ =L-'^{I (3.3) 

Thus, we wish to show that the matrix (/ — H)~^ can be well approximated by an 
identity matrix plus a low rank matrix, from which it would follow that 5*“^ « C~^ + 
LRC as desired. We have the following relations, 

(/ - H)-^ - / = L^S-^L -1 = - C-^)L = X, (3.4) 

from which we obtain: 


5-1 =C-^ +L-^XL-\ 


(3.5) 


Note that the eigenvalues of X are the same as those of the matrix S~^C — I. Thus, 
we will ask the question: Can X be well approximated by a low rank matrix? The 
answer can be found by examining the decay properties of the eigenvalues of X, which 
in turn can be assessed by checking the rate of change of the large eigenvalues of X. 
We can state the following result. 

Lemma 3.2. The matrix X in {'3-4) has the nonnegative eigenvalues 0k = A/c/(l — 
Afe) for fc = 1, • • • , s, where Xk is the eigenvalue of the matrix H in (3.2). 

Proof From (3.4) the eigenvalues of the matrix X are (1 —Afc)“^ —1 = Afe/(1 —A/c). 
These are nonnegative because from Lemma |3.1| the A^’s are between 0 and 1. □ 
Now we consider the derivative of 9k with respect to Xk- 


dOk _ 1 

dXk " (1 - XkY ■ 


This indicates a rapid increase when the Xk increases toward one. In other words, 
this means that the largest eigenvalues of X tend to be well separated and X can be 
approximated accurately by a low-rank matrix in general. Figure |3.1| illustrates the 
decay of the eigenvalues of the matrix L~"'"XL~^ and the matrix X for a 2-D Laplacian 
matrix, which is precisely the matrix shown in Figure [273| As can be seen, using just 
a few eigenvalues and eigenvectors will represent the matrix X (or XL~^) quite 
well. In this particular situation, 5 eigenvectors (out of the total of 127) will capture 
82.5% of X and 85.1% of L~^XL~^, whereas 10 eigenvectors will capture 89.7% of 
A and 91.4% of L-^XP-^. 

3.2. Two-domain analysis in a 2-D model problem. The spectral analysis 
of the matrix S~^ — C~^ is difficult for general problems and general partitionings. 
In the simplest case when the matrix A originates from a 2-D Laplacian on a regular 
grid, discretized by centered differences, and it is partitioned into 2 subdomains, the 
analysis becomes feasible. The goal of this section is to show that the eigenvalues of 
X and decay rapidly. 

Assume that —A is discretized on a grid 17 of size nx x {2ny + 1) with Dirichlet 
boundary conditions and that the ordering is major along the x direction. The grid is 
partitioned horizontally into three parts: the two disconnected Ux x Uy grids, namely 
Hi and H 2 , which are the same, and the rij, x 1 separator denoted by F. See Figure 
|3.2[ a) for an illustration. Let T^, be the tridiagonal matrix corresponding to F of 
dimension Ux x Ux which discretizes —d'^ldx^. The scaling term l//i^ is omitted so 
that Tx has the constant 2 on its main diagonal and —1 on the co-diagonals. Finally, 
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we denote by A the matrix which results from discretizing —A on 17 and reordered 
according to the partitioning 17 = {17i,172,r}. In 17i and 172, the interface nodes are 
ordered at the end. Hence, A has the form: 


/Ay Ey\ 

H = I Ay , (3-6) 

\^y tJ 


where Ay corresponds to the rix x Uy grid (i.e., 17i or 172), Ey defines the couplings 
between 17i (or 172) and F, and the matrix is associated with F, for which we have 

f,=T, + 21. ( 3 . 7 ) 


Figure [3)2](b) is an illustration of the nonzero pattern of A. 


Therefore, the Schur complement associated with F in (3.6) reads 


Sr = n- 2E^A;^ 


E,, 


(3.8) 


and the eigenvalues of X and L~'^ XL~^ correspond to those of S^^Tx — I and iSp ^ — 
T“^, respectively, in this case. The coupling matrix Ey has the form Ey = {0,lx), 
where Ix denotes the identity matrix of size Ux. Clearly, the matrix Ry = Ey A~^Ey 
is simply the bottom right (corner) block of the inverse of Ay, which can be readily 
obtained from a standard block factorization. Noting that Ay is of the form 


/fx -I \ 

-I fx -I 


A 


V ~ 


\ 


-I 

-I fx) 




\ 


I) V 


-I 

Duy) 


Fig. 3.1. Illustration of the decay 
of eigenvalues of X (left) and S~^ — 
C-^ = L-'^XL-^ (right) for a 2-D 
Laplacian matrix with Ux = riy =32, 
where the domain is decomposed into 4 
subdomains (i.e., p = 4), and the size 
of S is 127. 5 eigenvectors will capture 
82.5% of the spectrum of X and 85.1% 
of the spectrum of L~^XL~^, whereas 
10 eigenvectors will capture 89.7% of the 
spectrum of X and 91.4% of the spec¬ 
trum of L~^XL~^. 
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Fig. 3.2. Illustration of the matrix A and the corresponding partitioning of the 2-D mesh. 
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(b) Nonzero pattern of the reordered matrix. 


The Di’s satisfy the recurrence: Dk = for k = 2, - • • ,ny starting with 

Di =T^. The result is that each D]. is a continued fraction in T^. As can be easily 
verified Ry is equal to D~^. The scalar version of the above recurrence is of the form: 

dk = ‘2.a — - - , fc = 2, • • • , riy , with di = 2a . 

dk-i 

The di’s are the diagonal entries of the U-matrix of an LU factorization similar to the 
one above but applied to the Uy x Uy tridiagonal matrix T that has 2a on the diagonal 
and —1 on the co-diagonals. For reasons that will become clear we replaced the matrix 
Tx by the scalar 2a. We are interested in the inverse of the last entry, i.e., d~\ Using 
Chebyshev polynomials we can easily see that d~^ = Uny-i{a)/Uny{a) where Uk{t) is 
the Chebyshev polynomial of the second kind (for details, see Appendix): 

sinh((fc-I-1) cosh“^(t)) 
sinh(cosh”^(t)) 


In terms of the original matrix Ay, the scalar a needs to be substituted by T2,/2 = 
I + Tx/2. In the end, the matrix 5'“^ — C~^ = ^ — T~^ is a rational function of 

Tx/2. We denote this rational function by sit), i.e., 5'p ^ — T~^ = s(Tx/2.) and note 
that s is well-defined in terms of the scalar a. Indeed, from the above: 


s{a) 


1 1 


2a- 2 


Ur.y-l{a) 

U„y(a} 


2a 


Uny-l{^) 

a{2aUny{,a) - 2C7„^_i(a)) 


Uny-l{o.) 

a [U„,,+i(a) - Uny-i{a)] 


Everything can now be expressed in terms of the eigenvalues of Tx/2 which are 

kiT 


= 1 + 2 sin 


We can then state the following. 


2{nx + 1 ) 


, A: = 1, • 


(3.9) 


Proposition 3.3. Letrjk be defined in (3.9) and 9k = cosh ^{r]k), k = 1, - ■ ■ ,n, 


















(3.10) 


Then, the eigenvalues 7 ^ of o,re given by 


Ik = 


sinh(ny0fe) 


rjk [sinh((nj; + 2)0k) - sinh(ny6lfe)] 


, k =!,■■■ ,na; . 


Note that we have e®*’ = ? 7 fc + \/r?^ — 1 and sinh(n0fc) = [( 77 ^ + 1 / 77 ^ — 1)” — ( 77 ^ + 
— l)“"]/2, which is well approximated by {rjk + 1 / 77 ^ — l)"/2 for a large n. In 
the end, assuming Uy is large enough, we have 


7/c 


Vk 


ivk + \/vl- 1 )^ - 1 


277fe 


(»7fc-l) + 7fe\/77fc-l 


(3.11) 


This shows that for those eigenvalues of Tj, that are close to one, we would have a 
big amplihcation to the value 1 / 77 ^. These eigenvalues correspond to the smallest 
eigenvalues of T^,. We can also show that 


Ik 




1 

Vk 


and for the eigenvalues Cfc of S'p — I, we have 

Vk 


Ck — ‘2'Vk^k 




- 1 . 


An illustration of 7 ^,, Cfc and l/vk is shown in Figure 3.3 


Fig. 3.3. Illustration of the decay of the eigenvalues -fk of the matrix S~^ — C~^ and the 
eigenvalues of the matrix S~^C — I, and l/r]k for —A on a 2-D grid of size X {2ny + 1) with 
Ux = 65, Uy = 32, which is partitioned into 2 subdomains. 




4. Schur complement based preconditioning with low rank corrections. 


The goal of this section is to build a preconditioner for a matrix of the form (2.3) as 
obtained from the DD method. The preconditioning matrix M is of the form 


M = 


•T n-1 


B 


B 


(4.1) 


where S is an approximation to S. The above is approximate factorization of (2.4) 


whereby (only) S is approximated. In fact we will approximate directly the inverse of S 
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instead of S by exploiting low-rank properties. Specifically, we seek an approximation 
of the form S~^ = C~^ + LRC. From a practical point of view, it will be difficult to 
compute directly an approximation to the matrix S~^ — since we do not (yet) 
have an efficient means for solving linear systems with the matrix S. Instead we will 
extract this approximation from that of the matrix X defined in Section 3.1 see (3.4). 
Recall the expression (3.1) and the eigen-decomposition of H in (3.2), which yield, 

S = L{I-UAU^)L^ = LUil - A)U^L^. (4.2) 

The inverse of S is then 


S-^ = L-^U{I - A)-iC/^L-\ 
which we write in the form, 

S'-i = L-'^ (/ -f U[{I - A)-^ - L-b 
Now, assuming that H has an approximation of the following form, 

H « UAU'^, 

we will obtain the following approximation to S~^\ 

S-^ = L-'^U{I - A)-it/'^L-\ 

L-'^U[{I - A)-i - I]U'^L-\ 


= C 


-1 


(4.3) 

(4.4) 

(4.5) 

(4.6) 

(4.7) 


Proposition 4.1. Let S and H be defined by \3.1\ and (3.2) respectively and let 
S = diag((Ti,..., Gs) with the Ui ’s defined by 




i = l,...,s. 


1-A, 

Then, the eigendecomposition of SS~^ is given by: 

SS-^ = (Lt/)S(Lt/)-b 


(4.8) 


(4.9) 


Proof. From (4.2) and (|4.6[), we have 


SS-^ = LU{I - A)U^L'^L-^{U{I - A)-^U^)L-^ 

= {LU){I - A){I - A)-^{U^L-^) = (LC/)E(L[/)-b 


□ 

The simplest selection of A is the one that ensures that the k largest eigenvalues 
of {I — A)~^ match the largest eigenvalues of (/ — A)“^. Assume that the eigenvalues 
of H are Ai > A 2 > • • • > Ag, which means that the diagonal entries A^ of A are 
selected such that 


J Ai if i <k 

y 0 otherwise 


Proposition 4.1 indicates that in this case the eigenvalues of SS ^ are 


f 1 \i i < k 
1 — Ai otherwise 


(4.10) 
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Thus, we can infer that in this situation k eigenvalues of SS~^ will take the value one 
and the other s — k eigenvalues Ci satisfy 0 < 1 — A^+i < cr^ < 1 — Ag < 1. 

Another choice for A, inspired by [23], will make the eigenvalues of SS~^ larger 
than or equal to one. Consider defining A such that 


A,; = 


A. 


if i < k 
if i > k 


Then, from (4.8) the eigenvalues of SS ^ are 


(i-A)/(l-0) 


if 

if 


i < k 
i > k 


(4.11) 


(4.12) 


The earlier definition of A^ in (4.10) which truncates the lowest eigenvalues of H to 


zero corresponds to selecting 9 = 0. Note that for i > k, the eigenvalues can be made 
greater than or equal to one by selecting A^+i < d < 1. In this case, the eigenvalues 
(Ti for z > fc which are equal to ai = {1 — Xi)/{I — 9) belong to the interval 


1-Ag 

1-9 


c 


1 , 


1 


1 - 9 


(4.13) 


Thus, the spectral condition number of the preconditioned matrix is (1 — Ag)/(1 — 9). 
The choice leading to the smallest 2-norm deviation is letting 9 = A^+i. One question 
that may be asked is how does the condition number k = max ai / min vary when 
9 varies between 0 and 1? 

First observe that a general expression for the eigenvalues of SS~^ is given by 


(4.12) regardless of the value of 9. When A^+i < d < 1, we just saw that the spectral 


condition number is equal to (1 — Ag)/(1 — 9). The smallest value of this condition 
number is reached when 9 takes the smallest value which, recalling our restriction 
Afe+i < 0 < 1, is 0 = Afc+i. There is a second situation, which corresponds to when 
As < 0 < Afe+i. Here the largest eigenvalue is still (1 — As)/(1 — 0) which is larger 
than one. The smallest one is now smaller than one, which is (1 — Afe_|_i)/(1 — 0). 
So the condition number now is again (1 — As)/(1 — A^+i), which is independent of 
0 in the interval [Ag, A^+i]. The third and final situation corresponds to the case 
when 0 < 0 < Ag. The largest eigenvalue is now one, because (1 — As)/(1 — 0) < 1, 
while the smallest one is still (1 — Afc+i)/(l — 0). This leads to the condition number 
(1 — 0)/(l — Afc_|_i) and the smallest spectral condition number for 0 in this interval is 
reached when 0 = Ag leading to the same optimal condition number (1—Ag)/(1—A^+i). 
This result is summarized in the following proposition. 

Proposition 4.2. The spectral condition number k (9) of SS~^ is equal to 


1-1 


c(0) = 


1 — Afc+i 
1-Ag 
1 ~ Afc+i 
1-Ag 


1-0 


if 0 G [0, Ag) 
if 0 G [Ag, A/c+i] 
if 0 G (Afc+i, 1) 


(4.14) 


It has a minimum value of (1 — Ag)/(1 — A^+i), which is reached for any 9 in the 
second interval. 
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Fig. 4.1. Illustration of the condition 
number k{ 6) for the case of a 2-D Laplacian 
matrix with Ux = ny = 256 and the number 
of the subdomains p = 2, where 64 eigenvec¬ 
tors are used {i.e.,k = 64). As = .05719, 
Xk+i = .36145, and the optimal condition 
number is k = 1.4765. 


Figure 4.1 shows the variation of the condition number k{9) as a function of 
9^ for a 2-D Laplacian matrix. One may conclude from this result that there is no 
reason for selecting a particular 6 G [As,Afc+i] over another one as long as 9 belongs 
to the middle interval, since the spectral condition number k,{9) is the same. In fact, 
in practice when approximate eigenpairs are used, that are computed, for example, 
by the Lanczos procedure, the choice 6 = A^+i often gives better performance than 
= As in this context because for the former choice, the perturbed eigenvalues are less 
likely to be close to zero. An example can be found in Figure [4^ which shows that 
when using accurate enough eigenpairs, both choices of 9 will give the same condition 
number (which is also the optimal one), whereas when relatively inaccurate eigenpairs 
are used, setting 9 = A/c+i can give a better condition number than that obtained 
from setting 9 = Xg. In what follows, we assume that the approximation scheme 
(4.11) is used with 9 = Afc+i, and we will denote by the related approximate 
inverse of S. 


Fig. 4.2. Illustration of the eigenvalues of SS~^ for the case of a 2-D Laplacian matrix with 
nx = Uy = 128, the number of subdomains p = 2 and the rank k = 16, such that the optimal spectral 
condition number K(d) = 3.0464, for Xs < 0 < Afc_|_i. The two top figures show the eigenvalues 
of SS~^ with the Ritz values and vectors from 80 steps of the Lanczos iterations, where for both 
choices 8 = \s and 9 = Afe+i, k{9) = 3.0464. The bottom two figures show the eigenvalues of SS~^ 
in the cases with 32 steps of the Lanczos iterations, where k{Xs) = 7.8940 while K(Afc+i) = 6.6062. 



0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 


(a) 0 = As, 80 Lanczos steps (b) 9 = Xk+i^ 80 Lanczos steps 


(c) 0 = As, 32 Lanczos steps (d) 9 = X^+i, 32 Lanczos steps 



From an implementation point of view, it is clear that only the k largest eigenval- 
ues and the associated eigenvectors as well as the (fc + l)-st largest eigenvalue of the 
matrix are needed. We prove this result in the following proposition. 

Proposition 4.3. Let Zk be the eigenvectors of C~^E"^B~^E associated with 
the k largest eigenvalues, and let 9 = A^+i. The following expression for holds: 

Skp = [(^ - - (1 - zl- 
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(4.15) 








































































































































Proof. We write U = [Uk, W], where Uk = [ui,..., Uk] contains the eigenvectors 
of H associated with the largest k eigenvalues and W contains the remaining columns 
Ufc+i, • • • ,Us. Note that W is not available but we use the fact that WW"^ = I—UkU^ 
for the purpose of this proof. With this, (4.7) becomes: 


= C-i + Zfc [(/ - Afe)-1 -I]z^+ [(1 - - 1 ] L-^WW^L-^ 

= C-i + Zfe [(/ - Afe)-1 -I] Zip [(1 - 9)-^ - 1] L-^(J - UkUl)L-^ 
= + Zk [{I - Afe)-i - (1 - 9)-^l] Zl. 


□ 

In a paper describing a similar technique, Grigori et al. HD, suggest another 
choice of A which is: 


f 1 —(1 —Ai)/e if i<k 
I 0 otherwise ’ 


(4.16) 


where e is a parameter. Then the eigenvalues afs are 


J £ if i < k 

( 1 — Ai otherwise ’ 


Note that the first choice in (4.10) is a special case of (4.16) when e 
transformed eigenvalues as 


1. Writing the 


{e, 1 — Afc+i, 1 — Xk+ 2 , ■ ■ ■ , 1 — As}, 

the authors stated that the resulting condition number is k = (1 — As)/e, with an 
implied assumption that e < 1 — A^+i. In the cases when 1 — A^+i < e < 1 — As, the 
spectral condition number is the same as above, i.e., equal to (1 — As)/ (1 — A^+i). On 
the other hand, when 0 < e < 1 — A^+i, then the condition number is now (1 — As)/e, 
and the best value will be reached again for £ = 1 — Afc+i, which leads to the same 
condition number as above. 

In all the cases, if we want to keep the spectral condition number of the matrix 
SS~^, which is At = (1 — As)/(1 — A^+i), bounded from above by a constant K, we 
can only guarantee this by having k large enough so that 1/(1 — Afc+i) < K, or 
equivalently, A^+i < 1 — 1/K. In other words, we would have to select the rank k 
large enough such that 


Afc+i ■ (4-17) 

Of course, the required rank k depends primarily on the eigenvalue decay of the A^’s. 
In general, however, this means that the method will require a sufficient number of 
eigenvectors to be computed and that this number must be increased if we wish to 
decrease the spectral condition number to a given value. For problems arising from 
PDFs, it is expected that in order to keep the spectral condition number constant, k 
must have to be increased as the problem sizes increase. 
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5. Practical implementation. In this section, we will address the implemen¬ 
tation details for building and applying an SLR preconditioner. 


5.1. Computation of the low-rank approximations. One of the key issues 
in setting up the preconditioner (4.1) is to extract a low-rank approximation to the 
matrix C~^E^B~"^E. Assuming that C is SPD, we can use the Lanczos algorithm 
[IQIIIH] on the matrix L ^EL where L is the Cholesky factor of C. In 

the case when only a few extreme eigenpairs are needed, the Lanczos algorithm can 
efficiently approximate these without forming the matrix explicitly since the procedure 
only requires the matrix for performing the matrix-vector products. As is well-known, 
in the presence of rounding errors, orthogonality in the Lanczos procedure is quickly 
lost and a form of reorthogonalization is needed in practice. In our approach, the 
partial reorthogonalization scheme [25l [28] is used. The cost of this step will not be 
an issue to the overall performance when a small number of steps are performed to 
approximate a few eigenpairs. 


5.2. The solves with B and C. A solve with the matrix B amounts to p local 
and independent solves with the matrices Bi, i = 1, ■ ■ ■ ,p. These can be carried out 
efficiently either by a direct solver or by Krylov subspace methods with more tradi¬ 
tional ILU preconditioners for example. On the other hand, the matrix C, which is 
associated with the interface unknowns, often has some diagonal dominance proper¬ 
ties for problems issued from discretized PDEs, so that an ILU-based method can 
typically work well. However, for large indefinite problems, especially ones issued 
from 3-D PDEs, the interface corresponds to a large 2-D problem, and so a direct fac¬ 
torization of C will be expensive in terms of both the memory and the computational 
cost. An alternative is to apply the SLR method recursively. This requires that the 
interface points be ordered so that C will have the same structure as the matrix A. 
That is the leading block is block diagonal, a property satisfied by the Hierarchical 
Interface Decomposition (HID) method discussed in [T5|. This essentially yields a 
multilevel scheme of the SLR method, which is currently being investigated by the 
authors. In the current SLR method, we simply use ILU factorizations for C. 


5.3. Improving an SLR preconditioner. One of the main weaknesses of stan¬ 
dard, e.g., ILU-type, preconditioners is that they are difficult to update. For example, 
suppose we compute a preconditioner to a given matrix and find that it is not ac¬ 
curate enough to yield convergence. In the case of ILUs we would have essentially 
to start from the beginning. However, for SLR, improving a given preconditioner is 
essentially trivial. For example, the heart of the SLR method consists of obtaining a 
low-rank approximation the matrix H defined in (3.21. Improving this approximation 
would consist in merely adding a few more vectors (increasing k) and this can be 
easily achieved in a number of ways, e.g., by resorting to a form of deflation, without 
having to throw away the vectors already computed. 


6. Numerical experiments. The experiments were conducted on a machine 
at Minnesota Supercomputing Institute, equipped with two Intel Xeon X5560 proces¬ 
sors (8 MB Cache, 2.8 GHz, quad-core) and 24 GB of main memory. A preliminary 
implementation of the SLR preconditioner was written in C/C-l—1-, and the code was 
compiled by the Intel C compiler using the -02 optimization level. BLAS and LA- 
PACK routines from Intel Math Kernel Library were used to enhance the performance 
on multiple cores. The thread-level parallelism was realized by OpenMP [21|. 

The accelerators used were the conjugate gradient (CG) method for the SPD 
cases, and the generalized minimal residual (GMRES) method with a restart dimen- 
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sion of 40, denoted by GMRES(40) for the indefinite cases. Three types of pre¬ 
conditioning methods were compared in our experiments: the incomplete Cholesky 
factorization with threshold dropping (ICT) or the incomplete LDL factorization 
with threshold dropping (ILDLT), the restricted additive Schwarz (RAS) method [S] 
(with one-level overlapping), and the SLR method. For the RAS method, we used 
ICT/ILDLT as the local solvers. Moreover, since the RAS preconditioner is nonsym- 
metric even for a symmetric matrix, GMRES(40) was used with it. 

For all the problems, we used the graph partitioner PartGraphRecursive from 
Metis [MITT] to partition the domains. The time for the graph partitioning will not 
be included in the time of building the preconditioners. For each subdomain i, matrix 
Bi was reordered by the approximate minimum degree (AMD) ordering O [T] to 
reduce fill-ins and then ICT/ILDLT was used as a local solver. In the SLR method, 
the matrix C, which is assumed to be SPD, was factored by ICT. In the Lanczos 
algorithm, we set the maximum number of Lanczos steps as five times the number of 
requested eigenvalues. 

Based on the experimental results, we can state that in general, building an 
SLR preconditioner, especially those using larger ranks, requires much more time 
than an ICT/ILDLT preconditioner or an RAS preconditioner that requires similar 
storage. Nevertheless, experimental results indicated that the SLR preconditioner is 
more robust and can achieve great time savings in the iterative phase, expensive 
but effective preconditioners may be justified because their cost is amortized. In this 
section, we first report on the results of solving symmetric linear systems from a 2- 
D/3-D PDF on regular meshes. Then, we will show the results for solving a sequence 
of general sparse symmetric linear systems. For all the cases, the iterations were 
stopped whenever the residual norm had been reduced by 8 orders of magnitude or 
the maximum number of iterations allowed, which is 300, was exceeded. The results 
are summarized in Tables |6.2[ |6.3| and |6.5[ where all times are reported in seconds. 
When comparing the preconditioners, the following factors are considered: 1) fill- 
ratio, i.e., the ratio of the number of nonzeros required to store a preconditioner to 
the number of nonzeros in the original matrix, 2) time for building preconditioners, 
3) the number of iterations and 4) time for the iterations. In all tables, ‘F’ indicates 
non-convergence within the maximum allowed number of steps. 

6.1. 2-D and 3-D model problems. We examine problems from a 2-D/3-D 
PDF, 


-Art — cu = f in D, 

u = 4>[x) on dVl, 


( 6 . 1 ) 


where D = (0,1)^ and fl = (0,1)^ are the domains, and 9D is the boundary. We take 
the 5-point or 7-point centered difference approximation on the regular meshes. 

To begin with, we examine the required ranks of the SLR method in order to 
bound the spectral condition number of the matrix SS~^ by a constant K. Recall 


from (4.17) that this requires that the (fc-l-l)-st largest eigenvalue, A^+i, of the matrix 
C~^E^ B~^E be less than 1 — 1/K. The results for 2-D/3-D Laplacians are shown 
in Table |6.1[ From there we can see that for the 2-D problems, the required rank is 


about doubled when the step-size is reduced by half, while for the 3-D cases, the rank 
needs to be increased by a factor of roughly 3.5. 


In the next set of experiments, we solve (6.1) with c = 0, so that the coefficient 


matrices are SPD and we use the SLR preconditioner along with the CG method. 
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Table 6.1 

The required ranks of the SLR method for bounding the condition number of 55“^ by K for 
2-D/3-D Laplacians, The number of subdomains used is 8 for the 2-D case and 32 for the 3-D case. 


Grid 

rank 

a: « 33 

Grid 

rank 

K ^12 

128^ 

3 

25^ 

1 

256^ 

8 

CO 

0 

4 

512^ 

20 

643 

12 

10242 

42 

1003 

42 


Numerical experiments were carried out to compare the performance of the SLR 
preconditioner with the ICT and the RAS preconditioners. The results are shown 
in Table 6.2 The sizes of the grids, the fill-ratios (fill), the numbers of iterations 


(its), the time for building the preconditioners (p-t) and the time for iterations (i-t) 
are tabulated. For the SLR preconditioners, the number of subdomains (nd) and the 
rank (rk) are also listed. The fill-ratios of the three preconditioners were controlled to 
be roughly equal. For all the cases tested here and in the following sections, the RAS 
method always used the same numbers of subdomains as did the SLR method. The 
ICT factorizations were used for the solves with the matrices B and C in the SLR 
method. As shown in Table [6^ we tested the problems on three 2-D grids and three 
3-D grids of increasing sizes, where for the RAS method and the SLR method, the 
domain was partitioned into 32, 64 and 128 subdomains respectively, and the ranks 
16 or 32 were used in the SLR preconditioners. 


Table 6.2 

Comparison among the ICT, the RAS and the SLR preconditioners for solving SPD linear 
systems from the 2-D/3-D PDE in l|6.1|l with c = 0 along with the CG and the GMRES method. 


Grid 

fill 

IGT-GG 
p-t its 

i-t 

RAS-GMRES 
fill p-t its i-t 

nd 

rk 

SLR-GG 
fill p-t 

its 

i-t 

2562 

4.5 

.074 

51 

.239 

4.5 

.088 

129 

.281 

32 

16 

4.3 

.090 

67 

.145 

5122 

4.6 

.299 

97 

1.93 

4.8 

.356 

259 

2.34 

64 

32 

4.9 

.650 

103 

1.01 

10242 

5.4 

1.44 

149 

14.2 

6.2 

1.94 

F 

12.8 

128 

32 

5.7 

5.23 

175 

7.95 

403 

4.4 

.125 

25 

.152 

4.5 

.145 

36 

.101 

32 

16 

4.0 

.182 

31 

.104 

643 

6.8 

.976 

32 

1.24 

6.2 

.912 

49 

.622 

64 

32 

6.3 

1.52 

38 

.633 

1003 

7.3 

4.05 

47 

7.52 

6.1 

3.48 

82 

4.29 

128 

32 

6.5 

5.50 

67 

4.48 


Compared with the ICT and the RAS preconditioners, building an SLR precon¬ 
ditioner required more CPU time (up to 4 times more for the largest 2-D case). For 
these problems, the SLR-CG method achieved convergence in slightly more iterations 
than those with the ICT preconditioner, but SLR still achieved performance gains in 
terms of significantly reduced iteration times. The CPU time for building an SLR 
preconditioner is typically dominated by the cost of the Lanczos algorithm. Further¬ 
more, this cost is actually governed by the cost of the solves with B/s and C, which 
are required at each iteration. Moreover, when the rank k used is large, the cost of 
reorthogonalization will also become significant. Some simple thread-level parallelism 
has been exploited using OpenMP for the solves with the B/s, which can be performed 
independently. The multi-threaded MKL routines also helped speedup the vector op- 


16 

















erations in the reorthogonalizations. We point out that there is room for substantial 
improvements in the performance of these computations. In particular they are very 
suitable for the SIMD type parallel machines such as computers equipped with GPUs 
or with the Intel Xeon Phi processors. These features have not yet been implemented 
in the current code. 

Next, we consider solving the symmetric indefinite problems by setting c > 0 


in (6.1), which corresponds to shifting the discretized negative Laplacian (a positive 
definite matrix) by subtracting si with a certain s > 0. In this set of experiments, 
we solve the 2-D problems with s = O.OI and the 3-D problems with s = 0.05. The 
SLR method is compared to ILDLT and RAS with GMRES(40). 


Table 6.3 

Comparison among the ILDLT, the RAS and the SLR preconditioners for solving symmetric 
indefinite linear systems from the 2-D/3-D PDE in with c > 0 along with the GMRES method. 


Grid 

ILDLT-GMRES 
fill p-t its i-t 

RAS-GMRES 
fill p-t its i-t 

nd 

SLR-GMRES 
rk fill p-t 

its 

i-t 

256^ 

8.2 

.174 

F 

6.3 

.134 

F 

8 

32 

6.4 

.213 

33 

.125 

512^ 

8.4 

.702 

F 

8.4 

.721 

F 

16 

64 

7.6 

2.06 

93 

1.50 

10242 

12.6 

5.14 

F 

19.4 

21.6 

F 

8 

128 

10.8 

24.5 

50 

4.81 

40^ 

6.9 

.249 

54 .540 

6.7 

.254 

99 .300 

64 

32 

6.7 

.490 

23 

.123 

643 

9.0 

1.39 

F 

11.8 

2.16 

F 

128 

64 

9.1 

3.94 

45 

1.16 

1003 

14.7 

10.9 

F 

11.7 

14.5 

F 

128 

180 

14.6 

62.9 

88 

13.9 


Results are shown in Table 6.3 For most problems, the ILDLT/GMRES and the 
RAS/GMRES method failed even with high fill-ratios. In contrast, the SLR method 
appears to be more effective, achieving convergence for all cases, and great savings in 
the iteration time. In contrast with the SPD case, a few difficulties were encountered. 
For the 2-D problems, an SLR preconditioner with a large number of subdomains 
(say, 64 or 128) often failed to converge. As a result the sizes of the subdomains 
were still quite large and factoring the matrices B/s was quite expensive in terms 
of both the GPU time and the memory requirement. Furthermore, for both the 2-D 
and 3-D problems, approximations of higher ranks were required compared to those 
used in the SPD cases. This only increased the memory requirement slightly, but it 
significantly increased the CPU time required by the Lanczos algorithm. An example 
is the largest 3-D problem in Table |6)^ where a rank of 180 was used. 


6.2. General matrices. We selected 15 matrices from the University of Florida 
sparse matrix collection [S] for the following tests. Among these 10 matrices are SPD 
matrices and 5 matrices are symmetric indefinite. Table |6^ lists the name, the order 
(N), the number of nonzeros (NNZ), the positive definiteness, and a short description 
for each matrix. If the actual right-hand side is not provided, the linear system was 
obtained by creating an artificial one as b = Ae, where e is a random vector of unit 
2-norm. 


Table 6.5 shows the performance of the three preconditioning methods. The CG 
method and the GMRES method with the SLR preconditioner achieved convergence 
for all the cases, whereas for many cases, they failed to converge with the ICT/ILDLT 
and the RAS preconditioners. Similar to the experiments for the model problems, the 
SLR preconditioner often required more CPU time to build than the other two coun- 
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Table 6.4 

Names, orders (N), numbers of nonzeros (NNZ) and positive definiteness of the test matrices. 


MATRIX 

N 

NNZ 

SPD 

DESCRIPTION 

Williams / cant 

62,451 

4,007,383 

yes 

FEM cantilever 

UTEP / dubcova2 

65,025 

1,030,225 

yes 

2-D/3-D PDE problem 

UTEP / dubcovaS 

146,689 

3,636,643 

yes 

2-D/3-D PDE problem 

Rothberg/cfdl 

70,656 

1,825,580 

yes 

CFD problem 

Rothberg/cfd2 

123,440 

3,085,406 

yes 

CFD problem 

Schmid/thermall 

82,654 

574,458 

yes 

thermal problem 

Schmid / thermal2 

1,228,045 

8,580,313 

yes 

thermal problem 

Wissgott / parabolic_fem 

525,825 

3,674,625 

yes 

CFD problem 

CEMW/tmt_sym 

726,713 

5,080,961 

yes 

electromagnetics problem 

McRae/ecology2 

999,999 

4,995,991 

yes 

landscape ecology problem 

Lin/Lin 

256,000 

1,766,400 

no 

structural problem 

Cote/vibrobox 

12,328 

301,700 

no 

vibroacoustic problem 

Cunningham / qaSfk 

66,127 

1,660,579 

no 

3-D acoustics problem 

Koutsovasilis/F2 

71,505 

5,294,285 

no 

structural problem 

GHS indef/helm2d03 

392,257 

2,741,935 

no 

2-D Helmholtz problem 


terparts but it required fewer iterations for most of the cases and achieved significant 
CPU time savings in the iteration phase for almost all the cases (the exception is 
qaSfk, for which the RAS method gave the best iteration time). 

7. Conclusion. This paper presented a preconditioning method, named SLR, 
based on a Schur complement approach with low-rank corrections for solving symmet¬ 
ric sparse linear systems. Like the method in |21j . the new method uses a low-rank 
approximation to build a preconditioner, exploiting some decay property of eigenval¬ 
ues. The major difference with [3T] is that SLR is not recursive. It focuses on the 
Schur complement in any standard domain decomposition framework and tries to ap¬ 
proximate its inverse by exploiting low-rank approximations. As a result, the method 
is much easier to implement. 

Experimental results indicate that in terms of iteration times, the proposed pre¬ 
conditioner can be a more efficient alternative to the ones based on incomplete factor¬ 
izations, namely, the ILU-type or block ILU-type methods for SPD systems. More¬ 
over, this preconditioner appears to be more robust than the incomplete factorization 
based methods for indefinite problems. Recall that ILU-based methods often deliver 
unstable, and in some cases quite dense factors when the original matrix is highly 
indefinite, and this renders them ineffective for such cases. In contrast SLR is essen¬ 
tially a form of approximate inverse technique and as such it is not prone to these 
difficulties. On the negative side, building an SLR preconditioner can be time con¬ 
suming, although several mitigating factors should be taken into account. These are 
similar to those pointed out in |2I| which also exploits low-rank approximation and 
we summarize them here. The first is that a big part of the computations to build 
the SLR preconditioner can be easily vectorized and this is especially attractive for 
massively parallel machines, such as those equipped with GPUs or with the Intel 
Xeon Phi processors. The set-up phase is likely to be far more advantageous than a 
factorization-based one which tends to be much more sequential, see, e.g., [22] • The 
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Table 6.5 

Comparison among the ICT or the ILDLT, the RAS and the SLR preeonditioners for solving 
general symmetric linear systems along with the CG or GMRES(40) method. 


MATRIX 

fill 

ICT/ILDLT 
p-t its i-t 

fill 

RAS 

p-t its i-t 

nd 

rk 

SLR 
fill p-t 

its 

i-t 

cant 

4.7 

3.87 

150 

9.34 

5.9 

6.25 

F 

32 

90 

4.9 

5.58 

82 

1.92 

dubcova2 

2.7 

.300 

47 

.492 

2.8 

.489 

60 .223 

16 

32 

2.8 

.280 

19 

.080 

dubcovaS 

2.2 

1.01 

46 

1.44 

2.1 

1.46 

59 .654 

16 

32 

1.8 

.677 

19 

.212 

cfdl 

6.9 

2.89 

295 

11.9 

8.3 

3.04 

F 

32 

32 

6.9 

2.13 

64 

1.07 

cfd2 

9.9 

13.5 

F 

- 

8.9 

7.88 

F 

32 

80 

00 

bo 

7.62 

178 

5.75 

thermall 

5.1 

.227 

68 

.711 

5.0 

.348 

F 

16 

32 

5.0 

.277 

59 

.231 

thermal2 

6.9 

5.10 

178 

39.3 

7.1 

8.46 

F 

64 

90 

6.6 

14.8 

184 

15.0 

para_fem 

6.1 

2.04 

58 

4.68 

6.3 

3.17 

236 6.11 

32 

80 

6.9 

6.05 

86 

3.03 

tmt_sym 

6.0 

1.85 

122 

11.6 

6.2 

3.67 

F 

64 

80 

5.9 

6.61 

127 

5.23 

ecology2 

8.4 

2.64 

142 

18.5 

9.5 

4.78 

F 

32 

96 

8.0 

12.3 

90 

5.58 

Lin 

11 

1.93 

F 

- 

19 

4.61 

F 

64 

64 

9.9 

3.78 

73 

1.75 

vibrobox 

6.0 

.738 

F 

- 

7.0 

.513 

F 

4 

64 

3.8 

.437 

226 

.619 

qaSfk 

4.2 

.789 

22 

.507 

4.6 

1.14 

35 .273 

16 

64 

4.5 

1.94 

28 

.309 

F2 

5.1 

9.66 

F 

- 

5.4 

9.43 

F 

8 

80 

3.9 

6.25 

72 

2.14 

helm2d03 

14 

14.4 

F 

- 

11 

7.20 

F 

16 

128 

11 

11.9 

63 

2.63 


second is that there are situations in which many systems with the same matrix must 
be solved in which case more expensive but more effective preconditioners may be 
justified as their cost will be amortized. Finally, these preconditioners are more easily 
updatable than traditional ILU-type preconditioners, see Section 5.3 for a discussion. 

Appendix. Let 


and 
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be the LU factorization of T. We are interested in d~^. If we solve Tx = Cn where e„ 
is the nth canonical basis vector for K”, and x = then clearly ^„_i = 

l/d„ which is what we need to calculate. Let = Uk{a)^ for A: = 0,1, • • • , n— 1, where 
Uk is the A:-th degree Chebyshev polynomial of the second kind. These polynomials 
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satisfy the recurrence relation: Uk+i{t) = 2tUk{t) — Uk-i{t), starting with Uo{t) = 1 
and Ui{t) = 2t. Then clearly, equations fc = 1, • • • , n — 1 of the system Tx = e„ are 
satisfied. For the last equation we get f7„(a) instead of the wanted value of 1. Scaling 
X by Un{a) yields the result l/dn = Cn-i = Un-i{,a)/Un{,a). 
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